74 research outputs found
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Semantic segmentation, like other fields of computer vision, has seen a
remarkable performance advance by the use of deep convolution neural networks.
However, considering that neighboring pixels are heavily dependent on each
other, both learning and testing of these methods have a lot of redundant
operations. To resolve this problem, the proposed network is trained and tested
with only 0.37% of total pixels by superpixel-based sampling and largely
reduced the complexity of upsampling calculation. The hypercolumn feature maps
are constructed by pyramid module in combination with the convolution layers of
the base network. Since the proposed method uses a very small number of sampled
pixels, the end-to-end learning of the entire network is difficult with a
common learning rate for all the layers. In order to resolve this problem, the
learning rate after sampling is controlled by statistical process control (SPC)
of gradients in each layer. The proposed method performs better than or equal
to the conventional methods that use much more samples on Pascal Context,
SUN-RGBD dataset.Comment: Accepted in British Machine Vision Conference (BMVC), 201
Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories
Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity
Tell Me What They're Holding: Weakly-supervised Object Detection with Transferable Knowledge from Human-object Interaction
In this work, we introduce a novel weakly supervised object detection (WSOD)
paradigm to detect objects belonging to rare classes that have not many
examples using transferable knowledge from human-object interactions (HOI).
While WSOD shows lower performance than full supervision, we mainly focus on
HOI as the main context which can strongly supervise complex semantics in
images. Therefore, we propose a novel module called RRPN (relational region
proposal network) which outputs an object-localizing attention map only with
human poses and action verbs. In the source domain, we fully train an object
detector and the RRPN with full supervision of HOI. With transferred knowledge
about localization map from the trained RRPN, a new object detector can learn
unseen objects with weak verbal supervision of HOI without bounding box
annotations in the target domain. Because the RRPN is designed as an add-on
type, we can apply it not only to the object detection but also to other
domains such as semantic segmentation. The experimental results on HICO-DET
dataset show the possibility that the proposed method can be a cheap
alternative for the current supervised object detection paradigm. Moreover,
qualitative results demonstrate that our model can properly localize unseen
objects on HICO-DET and V-COCO datasets.Comment: AAAI 2020 Oral Camera Read
MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
We propose MAMo, a novel memory and attention frame-work for monocular video
depth estimation. MAMo can augment and improve any single-image depth
estimation networks into video depth estimation models, enabling them to take
advantage of the temporal information to predict more accurate depth. In MAMo,
we augment model with memory which aids the depth prediction as the model
streams through the video. Specifically, the memory stores learned visual and
displacement tokens of the previous time instances. This allows the depth
network to cross-reference relevant features from the past when predicting
depth on the current frame. We introduce a novel scheme to continuously update
the memory, optimizing it to keep tokens that correspond with both the past and
the present visual information. We adopt attention-based approach to process
memory features where we first learn the spatio-temporal relation among the
resultant visual and displacement memory tokens using self-attention module.
Further, the output features of self-attention are aggregated with the current
visual features through cross-attention. The cross-attended features are
finally given to a decoder to predict depth on the current frame. Through
extensive experiments on several benchmarks, including KITTI, NYU-Depth V2, and
DDAD, we show that MAMo consistently improves monocular depth estimation
networks and sets new state-of-the-art (SOTA) accuracy. Notably, our MAMo video
depth estimation provides higher accuracy with lower latency, when omparing to
SOTA cost-volume-based video depth models.Comment: Accepted at ICCV 202
DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow
Recent advancements in neural network-based optical flow estimation often
come with prohibitively high computational and memory requirements, presenting
challenges in their model adaptation for mobile and low-power use cases. In
this paper, we introduce a lightweight low-latency and memory-efficient model,
Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible
for edge applications such as mobile, XR, micro UAVs, robotics and cameras.
DIFT follows an iterative refinement framework leveraging variable resolution
of cost volumes for correspondence estimation. We propose a memory efficient
solution for cost volume processing to reduce peak memory. Also, we present a
novel dynamic coarse-to-fine cost volume processing during various stages of
refinement to avoid multiple levels of cost volumes. We demonstrate first
real-time cost-volume based optical flow DL architecture on Snapdragon 8 Gen 1
HTP efficient mobile AI accelerator with 32 inf/sec and 5.89 EPE (endpoint
error) on KITTI with manageable accuracy-performance tradeoffs.Comment: CVPR MAI 2023 Accepted Pape
Prevalence of Human Papilloma Virus Infections and Cervical Cytological Abnormalities among Korean Women with Systemic Lupus Erythematosus
We performed a multicenter cross-sectional study of 134 sexually active systemic lupus erythematosus (SLE) patients to investigate the prevalence of and risk factors for high risk human papilloma virus (HPV) infection and cervical cytological abnormalities among Korean women with SLE. In this multicenter cross-sectional study, HPV testing and routine cervical cytologic examination was performed. HPV was typed using a hybrid method or the polymerase chain reaction. Data on 4,595 healthy women were used for comparison. SLE patients had greater prevalence of high-risk HPV infection (24.6% vs. 7.9%, P<0.001, odds ratio 3.8, 95% confidence interval 2.5-5.7) and of abnormal cervical cytology (16.4 vs. 2.8%, P<0.001, OR 4.4, 95% CI 2.5-7.8) compared with controls. SLE itself was identified as independent risk factors for high risk HPV infection among Korean women (OR 3.8, 95% CI 2.5-5.7) along with ≥2 sexual partners (OR 8.5, 95% CI 1.2-61.6), and Pap smear abnormalities (OR 97.3, 95% CI 6.5-1,456.7). High-risk HPV infection and cervical cytological abnormalities were more common among Korean women with SLE than controls. SLE itself may be a risk factor for HPV infection among Korean women, suggesting the importance of close monitoring of HPV infections and abnormal Pap smears in SLE patients
- …